Central Jutland
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Brazil (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Scientists want you to smell ancient Egyptian mummies
A mixture of archeology and chemistry brings the aroma of mummification to museums. Breakthroughs, discoveries, and DIY tips sent six days a week. Visiting a museum could soon be a truly multisensory experience--smells included. Thanks to recent advances in the field of biomolecular archeology, scientists can now detect traces of molecular fingerprints on ancient artifacts. From these tiny particles, scientists can determine how the objects may have smelled .
- Africa > Middle East > Egypt (0.41)
- Europe > United Kingdom (0.05)
- Europe > Norway (0.05)
- (4 more...)
- Leisure & Entertainment (0.49)
- Media (0.30)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Oceania > New Zealand > South Island > Otago > Dunedin (0.04)
- (9 more...)
ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
Potamitis, Nearchos, Klein, Lars, Arora, Akhil
Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices overwhelmingly report single-run accuracy while ignoring the intrinsic uncertainty that naturally arises from stochastic decoding. This omission creates a blind spot because practitioners cannot reliably assess whether a method's reported performance is stable, reproducible, or cost-consistent. We introduce ReasonBENCH, the first benchmark designed to quantify the underlying instability in LLM reasoning. ReasonBENCH provides (i) a modular evaluation library that standardizes reasoning frameworks, models, and tasks, (ii) a multi-run protocol that reports statistically reliable metrics for both quality and cost, and (iii) a public leaderboard to encourage variance-aware reporting. Across tasks from different domains, we find that the vast majority of reasoning strategies and models exhibit high instability. Notably, even strategies with similar average performance can display confidence intervals up to four times wider, and the top-performing methods often incur higher and less stable costs. Such instability compromises reproducibility across runs and, consequently, the reliability of reported performance. To better understand these dynamics, we further analyze the impact of prompts, model families, and scale on the trade-off between solve rate and stability. Our results highlight reproducibility as a critical dimension for reliable LLM reasoning and provide a foundation for future reasoning methods and uncertainty quantification techniques. ReasonBENCH is publicly available at https://github.com/au-clan/ReasonBench .
Documenting SME Processes with Conversational AI: From Tacit Knowledge to BPMN
Small and medium-sized enterprises (SMEs) still depend heavily on tacit, experience-based know-how that rarely makes its way into formal documentation. This paper introduces a large-language-model (LLM)-driven conversational assistant that captures such knowledge on the shop floor and converts it incrementally and interactively into standards-compliant Business Process Model and Notation (BPMN) 2.0 diagrams. Powered by Gemini 2.5 Pro and delivered through a lightweight Gradio front-end with client-side bpmn-js visualisation, the assistant conducts an interview-style dialogue: it elicits process details, supports clarifying dialogue and on-demand analysis, and renders live diagrams that users can refine in real time. A proof-of-concept evaluation in an equipment-maintenance scenario shows that the chatbot produced an accurate "AS-IS" model, flagged issues via on-diagram annotations, and generated an improved "TO-BE" variant, all within about 12-minutes, while keeping API costs within an SME-friendly budget. The study analyses latency sources, model-selection trade-offs, and the challenges of enforcing strict XML schemas, then outlines a roadmap toward agentic and multimodal deployments. The results demonstrate that conversational LLMs can potentially be used to lower the skill and cost barriers to rigorous process documentation, helping SMEs preserve institutional knowledge, enhance operational transparency, and accelerate continuous-improvement efforts.
- Workflow (0.91)
- Research Report > New Finding (0.48)
BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model
Mohammed, Rawa, Attin, Mina, Shareef, Bryar
Automated radiology report generation (RRG) for breast ultrasound (BUS) is limited by the lack of paired image-report datasets and the risk of hallucinations from large language models. We propose BUSTR, a multitask vision-language framework that generates BUS reports without requiring paired image-report supervision. BUSTR constructs reports from structured descriptors (e.g., BI-RADS, pathology, histology) and radiomics features, learns descriptor-aware visual representations with a multi-head Swin encoder trained using a multitask loss over dataset-specific descriptor sets, and aligns visual and textual tokens via a dual-level objective that combines token-level cross-entropy with a cosine-similarity alignment loss between input and output representations. We evaluate BUSTR on two public BUS datasets, BrEaST and BUS-BRA, which differ in size and available descriptors. Across both datasets, BUSTR consistently improves standard natural language generation metrics and clinical efficacy metrics, particularly for key targets such as BI-RADS category and pathology. Our results show that this descriptor-aware vision model, trained with a combined token-level and alignment loss, improves both automatic report metrics and clinical efficacy without requiring paired image-report data. The source code can be found at https://github.com/AAR-UNLV/BUSTR
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Nevada (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Denmark > Central Jutland > Aarhus (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
AI Workers, Geopolitics, and Algorithmic Collective Action
According to the theory of International Political Economy (IPE), states are often incentivized to rely on rather than constrain powerful corporations. For this reason, IPE provides a useful lens to explain why efforts to govern Artificial Intelligence (AI) at the international and national levels have thus far been developed, applied, and enforced unevenly. Building on recent work that explores how AI companies engage in geopolitics, this position paper argues that some AI workers can be considered actors of geopolitics. It makes the timely case that governance alone cannot ensure responsible, ethical, or robust AI development and use, and greater attention should be paid to bottom-up interventions at the site of AI development. AI workers themselves should be situated as individual agents of change, especially when considering their potential to foster Algorithmic Collective Action (ACA). Drawing on methods of Participatory Design (PD), this paper proposes engaging AI workers as sources of knowledge, relative power, and intentionality to encourage more responsible and just AI development and create the conditions that can facilitate ACA.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.46)
- Europe > Ukraine (0.14)
- Asia > Middle East > Israel (0.14)
- (18 more...)
- Law (1.00)
- Information Technology (1.00)
- Government > Regional Government > North America Government > United States Government (0.94)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
Continuous sentiment scores for literary and multilingual contexts
Lyngbaek, Laurits, Feldkamp, Pascale, Bizzoni, Yuri, Nielbo, Kristoffer, Enevoldsen, Kenneth
Sentiment Analysis is widely used to quantify sentiment in text, but its application to literary texts poses unique challenges due to figurative language, stylistic ambiguity, as well as sentiment evocation strategies. Traditional dictionary-based tools often underperform, especially for low-resource languages, and transformer models, while promising, typically output coarse categorical labels that limit fine-grained analysis. We introduce a novel continuous sentiment scoring method based on concept vector projection, trained on multilingual literary data, which more effectively captures nuanced sentiment expressions across genres, languages, and historical periods. Our approach outperforms existing tools on English and Danish texts, producing sentiment scores whose distribution closely matches human ratings, enabling more accurate analysis and sentiment arc modeling in literature.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (16 more...)